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Most known Sbox Advanced Encryption Standard implementations aim at minimizing chip covered 
area or achieving high throughput, and usually power consumption is a secondary metric of their 
cost. However, the need for low power applications with strict chip covered area constrains is great 
especially in mobile devices. In this paper low power architectures in limited area resources are pro- 
posed for Sbox, the basic cryptographic primitive of AES. Those architectures support encryption 
and decryption operation modes. In our proposed implementations, retaining a small chip covered 
area cost, hardware techniques for low power design, such as re-ordering of the components, 
introduction of redundant hardware (ideal delay line), reducing the driving strength - fan out, and 
insertion of registers in highly unbalanced points of the circuit are applied. Therefore, our imple- 
mentations do not only reduce power consumption by a large degree, but they also have good area 
properties and offer an advantageous power-delay-area product in comparison with other known 
Sbox implementations. These properties give to our system the advantage to support low power 
mobile devices in low area system environments. 

Keywords: Advanced Encryption Standard, Sbox Transformation, Cryptography, VLSI, 
Hardware, Wireless Networks, Low-end Devices. 



1. INTRODUCTION 

In January 1997, the National Institute of Standards and 
Technology (NIST) invited new algorithm proposals for 
the Advanced Encryption Standard (AES) in order to 
replace the old Data Encryption Standard (DES). After 
two rounds of evaluation on the 15 candidate algorithms, 
NIST selected the Rijndael as the AES algorithm in Octo- 
ber 2000. l Since NIST adopted AES algorithm as basic 
standard for symmetric cryptography, AES has become the 
main symmetric algorithm in many communication proto- 
cols and applications. The AES algorithm has broad appli- 
cations, including mobile phones, cellular phones, smart 
cards, RFID tags, WWW servers and automated teller 
machines (ATMs). 

The more widely used wireless protocols 2-3 adopted 
AES algorithm as their basic security mechanism 4 for 
providing system authentication, authorization and data 
integrity. However, in such systems, the constrains in 
power dissipation and chip covered area are very strict. 
Recent mobile devices (laptops, PDAs, etc.) can operate 
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just for few hours before the battery gets exhausted. Even 
worse, the difference between power requirements of elec- 
tronic components and battery capacities is expected to 
increase in the near future. 5-6 New passive devices as con- 
tactless smarts cards and RFID tags 7 that extract power 
from electromagnetic fields make the problem more com- 
plicated. Hardware design is known to accelerate algo- 
rithmic processes and decrease the requirements in power. 
Therefore, the existence of supplementary hardware is 
essential 8 if the designer's goal is to construct power 
efficient systems in limited area resources, like mobile 
systems. 

The existing implementations of AES do not focus on 
the problem of power consumption but rather to present 
high throughput architectures. 9-10 Such architectures do 
not reach their full potential in mobile systems since such 
systems do not include high throughput specifications. In 
this paper we propose several hardware implementations 
for AES S-box primitive, that consists the main cryp- 
tographic primitive and the main factor of power con- 
sumption in the AES algorithm. In the works 11-12 it is 
proven that the power dissipation of Sbox cryptographic 
primitive is about the 75 percent of the AES overall 
power dissipation. So improving the power characteristics 
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Fig. 1. Encryption and decryption processes of AES Algorithm. 



of Sbox component we improve the algorithm power effi- 
ciency significantly. There exist many works 13-16 that try 
to give a more simplified architecture of AES Sbox in 
terms of chip covered area resources, ignoring that area 
reduction is not proportional to power reduction. How- 
ever in mobile-wireless devices power consumption is the 
main constrain that have to be reduced. Achieving lower 
power consumption increases the battery efficiency of the 
system and it gives the opportunity for building more 
complex systems. In this paper we propose several Sbox 
implementations and we apply VLSI techniques to such 
systems achieving low power operation in limited area 
resources. 

A brief basic structure of the standard AES is given 
in Section 2. In Section 3 the S-box transformation is 
presented in detail. In Section 4 the most significant 
approaches in S-Box design are reported. The proposed 
architecture is presented in Section 5. Experimental results 
and Comparisons are presented in Section 6. Finally, the 
paper is concluded in Section 7. 

2. AES ALGORITHM 



(0 < /, j < 4), and is considered as an element of GF(2 S ). 
Although different irreducible polynomials can be used to 
construct, GF(2^) field, the irreducible polynomial used in 
the AES algorithm is p(x) = x 8 + x 4 + x 3 + x + 1 . Figure 1 
shows the block diagram of the AES encryption and the 
equivalent decryption structures. 

In the encryption of the AES algorithm, each round 
except the final round consists of four transformations: 
the SubBytes, the ShiftRows, the MixColumns, and the 
AddRoundKey, while the final round does not have the 
MixColumns transformation. 

• SubBytes — a non-linear substitution step where each 
byte is replaced with another according to a lookup table. 

• ShiftRows — a transposition step where each row of 
the state is shifted cyclically a certain number of 
steps. 

• MixColumns — a mixing operation which operates on 
the columns of the state, combining the four bytes in each 
column using a linear transformation. 

• AddRoundKey — each byte of the state is combined with 
the round key; each round key is derived from the cipher 
key using a key schedule. 



The AES algorithm is a symmetric-key cipher, in which 
both the sender and the receiver use a single key for 
encryption and decryption. The data block length is fixed 
to be 128 bits, while the key length can be 128, 192, or 
256 bits, respectively. In addition, the AES algorithm is an 
iterative algorithm. Each iteration can be called a round, 
and the total number of rounds, A^ r is 10, 12, or 14, when 
the key length is 128, 192, or 256 bits, respectively. The 
128-bit data block is divided into 16 bytes. These bytes 
are mapped to a 4 x 4 array called the State, and all the 
internal operations of the AES algorithm are performed 
on the State. Each byte in the State is denoted by S tj 
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Fig. 2. The effect of the SubBytes() transformation on the State. 
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Fig. 3. SubBytes() applies the S-box to each byte of the State. 

The previous Cipher transformations can be inverted 
and then implemented in reverse order to produce a 
straightforward Inverse Cipher for the AES algorithm. 
The individual transformations used in the Inverse Cipher 
are InvShiftRows, InvSubBytes, InvMixColumns, and 
AddRoundKey. 

3- SUBBYTES TRANSFORMATION 

The SubBytes() (Ref. [1]) transformation is a non-linear 
byte substitution that operates independently on each byte 
of the State using a substitution table (S-box). This S-box 
(Fig. 4), which is invertible, is constructed by composing 
two transformations: 

(1) Take the multiplicative inverse in the finite field 
GF(2*) of the input, the element {00} is mapped to itself. 

(2) Apply the following affine transformation (over 
GF(2)): 

D i = O i tty+4) #(/+5) mod 8 ® ^(/+6) mod 8 ® ^(/+7) mod 8 ® C t , 

for < i < 8 (1) 



where ^ is the i t th bit of the byte, and c t is the i t th bit of a 
byte c with value {63} or {01100011}. Here and elsewhere, 
a prime on a variable (e.g., b\) indicates that the variable 
is to be updated with the value on the right. In matrix 
form, the affine transformation element of the S-box can 
be expressed as: 

The S-box used in the SubBytes() transformation is pre- 
sented in hexadecimal form in Figure 4. For example, if 
S t . = {53}, then the substitution value would be deter- 
mined by the intersection of the row with index '5' and 
the column with index '3' in Figure 4. This would result 
in S'ij having a value of {ed}. 

InvSubBytes() is the inverse of the byte substitution 
transformation, in which the inverse Sbox is applied to 
each byte of the State. This is obtained by applying the 
inverse of the affine transformation followed by taking the 
multiplicative inverse in GF(2 S ). 

4. PREVIOUS SBOX IMPLEMENTATIONS 

The simplest implementation of SB ox transformation 
is the integration in silicon of the values presented 
in Figures 4 and 5 using specific look up tables. In 
this approach, the mathematical background of SB ox 
transformation is not utilized for optimizations in any 
way. The optimization process of such implementation 
is based exclusively on the synthesis tool and assorted 
used libraries. Therefore, the power consumption is also 
depended on the synthesizer libraries. Since the synthe- 
sizer can run a highly sophisticated optimization pro- 
cess, the resulting circuit has small critical path delay 
and low power consumption. However, the resulting area 
of a look up table SB ox transformation circuit is very 
high compared to other similar designs (about three times 
higher). 11131416 
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Fig. 4. S-box: substitution values for the byte xy (in hexadecimal format). 
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Fig. 5. Inverse S-box: substitution values for the byte xy (in hexadecimal format). 



A similar implementation approach can be followed by 
replacing the AES SB ox transformation by a ROM. In 
this case, the critical path delay is very similar to a hard- 
ware look-up table approach, but the power consumption 
of ROM is about 2-3 orders of magnitude higher that this 
approach. 

A third approach for implementing the AES S-box 
was proposed by Bertoni et al. in Ref. [15]. By using 
an intermediate one-hot encoding of the input, arbitrary 
logic functions (including cryptographic S -boxes) can be 
realized with minimal power consumption. Designing the 
Sbox transformation using this approach the system would 
consists of an 8 to 2 8 encoder, a 2 8 to 8 decoder and 
a appropriate mapping of the encoder's output to the 
decoder's input. The main drawback of this approach is 
that the resulting implementation employs large silicon 
area on chip. However, this implementation supports only 
the encryption mode. For decryption, an identical sys- 
tem is required. For this reason the resulting implemen- 
tation of the above system employs large silicon area on 
chip. Those area resources are considerably higher than 
the conventional look up table approach and the arithmetic 
implementation approaches are presented later. An addi- 
tional disadvantage of Bertoni 's implementation is that the 
architecture can not be easily modified and is not flex- 
ible. In order to make this architecture faster, a consid- 
erable amount of latches should be inserted in the data 
flow of the system thus increasing the chip covered area 
dramatically. This problem makes the above implementa- 
tion not useful for mobile users devices with limited area 
resources. 

There are more advanced approaches in designing Sbox 
transformation. Those approaches take advantage of the 
GF(2 8 ) field arithmetic properties (arithmetic implementa- 
tion approaches). The main issue in such implementations 



is the efficient realization of the inversion in GF(2^) that 
can be achieved by decomposing the finite field into the 
sub-fields GF(2 4 ) and GF{2 2 ). An inversion in a finite 
field of characteristic 2 can be carried out in various ways 
according to the employed element representation basis of 
such a field. The main advantage of these implementa- 
tions is their small chip covered area cost because both 
encryption and decryption employ the same silicon core. 
However, the reduction in chip covered area does not lead 
to a corresponding reduction in power dissipation. On the 
contrary, in those arithmetic approaches the power dissi- 
pation is not considered. Due to possible glitches in the 
internal component and in the overall circuit topology (net- 
work), power dissipation can be 10-30 times higher than 
the lookup table approach. 17 

In this paper, we propose a methodology that can be 
used to synthesize cryptographic Sboxes on custom sili- 
con (ASIC) libraries with energy-efficiency as a primary 
goal. For comparison reasons we implement look up table 
S-Box along with a representative arithmetic implementa- 
tion by Wolkerstorfer et al. 13 and we proposed four opti- 
mized implementations of Ref . [13]. 

5. PROPOSED IMPLEMENTATION 

As described in Section 4, the basic advantage of arith- 
metic implementations of S-BOX cryptographic primi- 
tive is their limited covered area resources. This fact 
is based on ability of those implementations to support 
both encryption and decryption using the same logic cir- 
cuit. As already analyzed, the main problem of arithmetic 
implementations is the increased power dissipation in con- 
trast to other implementations. We propose a new imple- 
mentation approach that combines low power dissipation 
and low chip covered area for application with limited 
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resources. The proposed AES implementation is based 
on a modified S-Box 13 that can perform both encryption 
and decryption and the arithmetic of the GF(2 8 ) finite 
field. The aim of this work is to provide an area cost- 
effective solution of an AES implementation with low- 
power constrains. This means that power dissipation is 
one of the main design parameters and cost (i.e., redun- 
dant hardware) can be altered for succeeding in this. To 
the best of our knowledge, there has never been pro- 
posed such an implementation that combines low power 
dissipation with low area specifications. Four alternative 
implementations are described in detail named Impleml, 
Implem2, ImplemlR, and Implem2R. All the steps of 
the optimization process for the architectures Impleml 
and Implem2 are presented. Impleml and Implem2 have 
their corresponding pipelined versions ImplemlR and 
Implem2R. 

The proposed S-Box can be modified to present low- 
power dissipation, for either full-custom or semi-custom 
technologies, by balancing the delays of the logic gates' 
inputs and improving several characteristics of the cir- 
cuit. This can be achieved by using techniques, such as 
re-ordering of the components, introduction of redundant 
hardware (ideal delay line), and by reducing the driving 
strength - fan out, and insertion of registers in highly 
unbalanced points of the circuit. One of the main dissi- 
pation sources is the dynamic power dissipation caused 
by the consumption of power either on the logic gates or 
on the wires that compose the circuit's network. The first 
parameter (power dissipation on gates) can be minimized 
if the inputs are applied simultaneously and all the transi- 
tions in all the inputs of the gate occur at the same time 
instance. This is an ideal and unfortunately only theoret- 
ical scenario, in which power is consumed solely for the 
calculation of the gate's output. We refer to this scenario 
as 'zero-delay' model. 

The second parameter (power dissipation on the net- 
work) can be minimized only by controlling the outputs of 
the logic gates and hence eliminate spurious and desirable 
transitions. The so called signal glitching not only causes 
power consumption on the nets but furthermore triggers 
the logic gates at their end, forcing them to calculate a 
faulty output. 

Some common techniques to address the aforemen- 
tioned problems are either the use of custom sizes of 
transistors, with a variety of voltage thresholds and noise 
margins, or the modification of the circuit's logic func- 
tion. The first approach is trivial for every circuit design, 
it is applicable for high-speed circuits targeting mainly 
technologies allowing full-custom design. The second 
approach can be easily applied on various technologies and 
it is based mainly on the functionality of the circuit and 
not its physical implementation. The higher power saving 
in a vast number of VLSI implementations is based on this 
high-level design approach. This fact, makes it appropriate 
for our goal. 



In the following paragraphs our proposed methodology 
is discussed in detail. The implementation is based in finite 
fields operations thus it employs appropriate components 
for multiplication, squaring, inversion, XOR operation and 
some extra logic in order to enable encryption/decryption 
operation modes. In Figure 6 the logic that creates main 
signal highlighted. The contribution of those glitches is 
significant in the overall power dissipation. 

The network's delays (shaded network in Fig. 6) are bal- 
anced by inserting an extra XOR gates level. In Figure 7(a) 
(Impleml architecture), the result of such operation is 
the creation of the COMP component. This approach 
contributes significantly to the elimination of spurious 
transitions that were caused due to the misdistribution of 
propagation delays. The resulted structure of the SBox 
seems more balanced as far as it concerns the glitch gen- 
eration of the INVERSION component's inputs. However, 
this statement is not true since components COMP and 
MULGF(2 4 ) have different critical paths. Thus, additional 
steps to further optimize the balancing of the propagated 
signal delays on the main data path, are the flattening 
of component COMP, the analysis of its functionality, its 
optimization in terms of speed and finally, the creation of 
an output datapath with a delay similar to that of the com- 
ponent MULGF(2 4 ). 
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Fig. 7. The modified SB ox architectures (a) Impleml and (b) Implem2. 

Flattening of the COMP circuit reveals that it consists 
only from XOR gates, which is ideal for creating a bal- 
anced XOR tree (Fig. 7(a) - Impleml). Analysis of COMP 
component's functionality and having in mind that only 



XOR logic gates are utilized, reveals that all the output 
bits can be expressed solely as a function of the input bits, 
without considering any intermediate values. The opti- 
mization of the XOR tree, consists of two steps. Firstly, 
the logic is simplified using simple Boolean Algebra. The 
result was quite interesting since it was discovered that 
porting the SB ox in finite fields, eliminates several depen- 
dencies. Thus, the complex initial logic can be substituted 
by small XOR trees. Secondly, extra XOR logic gates were 
introduced in order to balance the delays throughout the 
XOR trees. The same procedure (insertion of XOR logic 
gates) is followed in order to equalize the output delays of 
the components COMP and MULGF(2 4 ). 

A further optimization (Implem2) for decreasing the 
power dissipation is to modify several characteristics of the 
circuit. A critical issue of the initial implementation was 
the high fan-out of the component MAP. The use of buffers 
to form the appropriate driver (superbuffer), increases not 
only the cost but also the propagation delay. Thus the opti- 
mization can be focused on reducing the fan-out of the 
component MAP. This can be achieved with two different 
approaches. The first approach is to duplicate the com- 
ponent MAP and share the fan-out to the two replicas. 
Although this is straightforward, the tradeoff between a 
superbuffer and two MAP components is not fair, in terms 
of cost. The other approach, which is not always appli- 
cable, is to transform the component COMP to have as 
inputs the inputs of component MAP, by correlating the 
two inputs and discover their dependencies. 

The result of the latter optimization is illustrated in 
Figure 7(b) (Implem2). In order to make clearer how the 
component COMP resulted in its flattened small sized 
implementation, the series of simplifications, using Boole 
Algebra (the known property that a(i) ® a(i) = 0), is 
offered below. In Figure 8, the nodes of the circuit that 
can be optimized are numbered and the analysis and sim- 
plification process that is followed is presented below: 



Node 2 

ah(0) = fl(5)©a(4)©fl(6) 

ah{\) = a(l)®a(7)®a(4)®a(6) 

ah(2) = a(5)®a(l)®a(2)®a(3) 

ah(3) = 0(5)00(7) 

Node 3 

sq(0) = ah(0)®ah(2) 



(2) 



0(5)00(4)00(6)00(5)00(7)00(2)00(3) 
0(4)00(6)00(7)00(2)00(3) 



sq{\) = ah(2) = a(5)®a(l)®a(2)®a(3) 

sq{2) = ah(l)®ah(3) 

= 0(1)00(7)00(4)00(6)00(5)00(7) 
= 0(1)00(4)00(5)00(6) 



(3) 



sq(3) = ah(3) = a(5)®a(7) 
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(5) 



Node 4 

e(0) = sq(l)®sq(2)®sq(3) 

= a(5)0a(7)0a(2)0fl(3) 

= a(2)0a(3)0a(4)0fl(5)0fl(6)0fl(l) 

e(l) = sq(0)®sq(l) 

= a{4)®a{6)®a{l)®a{2)®a{3) 

®a(5)®a(l)®a(2)®a(3) 
= a(4)0fl(5)0fl(6) 

e(2) = sq(0)®sq(l)®sq(2) ( 4 ) 

= a(4)0a(6)0fl(7) 

®a(2)®a(3)®a(5)®a(7)®a(2)®a(3) 
®a(l)®a(4)®a(5)®a(6) = a(l) 

e{3) = sq(0)®sq(l)®sq(2)®sq(3) 

= a(4)0fl(6)0fl(7) 

®a(2)®a(3)®a(5)®a(l) ®a{2) ®a{3) 
®a(l)®a(4)®a(5)®a(6)®a(5)®a(l) 

= a(l)®a(5)®a(l) 

Node 5 

al(0) = a(4)®a(6)®a(0)®a(5) 

al(l) = a(l)®a(2) 
al{2) = a(l)®a(l) 
al{3) = fl(2)0a(4) 
Node 6 

^'(0) = al(0)®al(2) 

= a(4)0a(6)0a(O)0fl(5)0fl(l)0fl(7) 
= a(0)®a(l)®a(4)®a(5)®a(6)®a(l) 

sq'(l) = al(2) = a(l)®a(l) ( 6 ) 

sq\2) = al(l)®al(3) 

= a(l)®a(2)®a(2)®a(4) = a(l)®a(4) 

sq\3) = al(3) = a(2)®a(4) 
Node 7 

x(0) = e(0)®sq , (0) = a(0)®a(2)®a(3)®a(l) 

x(l) = e(l)®sq , (l) = a(l)®a(l)®a(4)®a(5)®a(6) 

x{2) = e(2)®sq'(2) = a(4) 

x(3) = e(3)®sq'(3) = a(2)®a(4)®a(l)®a(5)®a(l) 

(7) 
In our application, the latter optimization approach can 
be effective. Initially, there was a fun-out problem of the 
component MAP. However, the simplification of the func- 
tions in COMP results in a modified component COMP2, 
which is driven by the inputs of the component MAP, 
instead of its outputs. This is critical for the size of the 
transistors driven by the outputs of the MAP component 
as the requirements for a good driving strength are now 
relaxed. Furthermore, the outputs of the modified com- 
ponent are responding very fast, offering a bigger degree 
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Fig. 8. Circuit that can be simplified. 

of freedom to alter the propagation delay, of the COMP2 
signals, appropriately. This results in balancing the prop- 
agation delay of the whole circuitry, thus maximizing the 
power savings when compared to non-pipelined implemen- 
tations (lookup tables, arithmetic implementations). 

The implementations implemlR and implem2R are real- 
ized at the final optimization step for increasing the power 
savings even more, is to introduce several pipeline stages 
throughout the circuit. The insertion of pipeline registers 
eliminates, at the stage of insertion, the propagation of 
glitches. This is a useful technique in order to completely 
control power dissipation, paying however the appropri- 
ate cost. Considering that the latter optimization steps 
have reduced area requirements in terms of logic, driving 
instances and transistor sizes, it is fair to agree that the 
tradeoff between cost and power decrease satisfies in total 
the initial application constrains (mobile-end user). 

The power savings when inserting the pipeline registers 
at the appropriate location is expected to be at least 2 times 
higher than those achieved with non-pipelined implemen- 
tation. The estimation is fair for circuitry that cannot be 
easily balanced in terms of propagation delay. However, 
in our application, that the optimization steps have bal- 
anced significantly the propagation delay of the signals 
throughout the circuitry, the power saving is expected to be 
much higher for a fair penalty in cost. As it will be shown 
below, where the experimental results of our implemen- 
tations are presented, the overall power dissipation is one 
degree of magnitude lower than that of the non-pipelined 
version. 

6. COMPARISONS AND EXPERIMENTAL 
RESULTS 

All implementations have been synthesized with Synop- 
sys synthesis tools (Design Compiler) on a Linux platform 
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and the target technology library is 0.18 /im with 1.8 V 
core voltage. All post-synthesis simulations and generation 
of the switching activity information have been done at 
the logic gate level with 1 ps resolution, using the VHDL 
simulator Modelsim of Mentor Graphics. The total power 
dissipation is measured using Synopsys' PrimePower. 

We implement six SBOX approaches in VHDL. The 
lookup table approach and Wolkerstorfer approach are 
implemented for comparison reasons. The other four 
implementations consist four distinct optimization steps 
that we can follow in order to propose a final optimized 
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Fig. 9. The (a) Implem2 and (b) Implem3 SB ox architectures with 
registers. 
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Fig. 10. Area comparisons of AES SB ox architectures. 

approach. In Figures 10-13 the main problems of the arith- 
metic implementation of Wolkerstorfer 13 are highlighted. 
Those problems are the high power dissipation and critical 
path delay. These parameters make the area optimization 
of Ref. [13] useless since the trade-off in power and delay 
is unacceptable. Furthermore, as shown in Figures 10, 11, 
although the power dissipation table implementation, the 
area resources are very high compared to all the other 
implementations. Figures 10-13 also present implementa- 
tion results for the proposed four implementations. With- 
out the use of any register, very satisfactory results are 
achieved for implementation2 (implem2). The Power x 
Area x Delay, (PAD) product, that offers a fair measure 
of the achieved optimization degree, is about the same as 
the conventional lookup table implementation by achieving 
much smaller chip covered area and slightly more power 
dissipation. The use of registers, as is introduced in imple- 
mentations impleml R, implem2R, give better results by 
one order of magnitude in comparison to all the other 
implementations with the exception of the chip covered 
area. The implementation results are shown analytically in 
Table I. From this table, it can be concluded that the pro- 
posed architecture fitting best in accordance to our goals 
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Fig. 11. Power comparisons of AES SB ox architectures. 
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Fig. 12. Delay comparisons of AES SB ox architectures. 
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architectures. 



Table I. Implementation comparisons of AES SB ox architectures. 





Power 


Area 


Delay 


PxA 


PxAxD 




(W) 


(cells) 


(ns) 


product 


product 


Lookup tables 


0.00056 


1009.2384 


2.93 


0.56504 


1.655958366 


Wolkerstorfer 13 


0.0023 


362.2191 


7.33 


0.8331 


6.106651807 


Impleml 


0.00215 


343.84275 


6.69 


0.7392 


4.94556622 


Implem2 


0.001007 


344.5444 


6.17 


0.3469 


2.140719821 


Impleml R 


0.0003 


708.977 


2.15 


0.2126 


0.45729017 


Implem2R 


0.0001 


712.6414 


1.93 


0.0712 


0.1375398 



of low power and low area constrains with out using reg- 
isters, is implementation2 (implem2). The Power x Area 
(PD) product, that offers a fair measure of the achieved 
optimization degree following the low area-low power con- 
strains, has the lowest value in the no register cases when 
implementation 2 is used. 



7. CONCLUSION 

In this paper several implementations of AES SB ox trans- 
formations were proposed. Using arithmetic implementa- 
tion logic for achieving low chip covered area resources 
we enhance our implementations with low power tech- 
niques in order to reduce the power dissipation of such 
implementations. Comparing the proposed implementa- 
tions with other similar works, confirm the benefits of the 
proposed designs in low power, low chip covered area 
systems. Such implementations can be used efficiently in 
mobile devices where the need for low power dissipation 
and small chip covered area is great. 
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